4 research outputs found
Fail Over Strategy for Fault Tolerance in Cloud Computing Environment
YesCloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected
system failure or malfunction, a robust fault-tolerant design may allow the cloud to continue functioning correctly
possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the
application execution and hardware performance, various fault tolerant techniques exist for building self-autonomous
cloud systems. In comparison to current approaches, this paper proposes a more robust and reliable architecture using
optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using
pass rates and virtualised mechanisms, the proposed Smart Failover Strategy (SFS) scheme uses components such as
Cloud fault manager, Cloud controller, Cloud load balancer and a selection mechanism, providing fault tolerance via
redundancy, optimized selection and checkpointing. In our approach, the Cloud fault manager repairs faults generated
before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme
is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future
request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and
backward recovery using diverse software tools. Compared to existing approaches, preliminary experiment of the SFS
algorithm indicate an increase in pass rates and a consequent decrease in failure rates, showing an overall good
performance in task allocations. We present these results using experimental validation tools with comparison to other
techniques, laying a foundation for a fully fault tolerant IaaS Cloud environment
A New Hybrid Fault Tolerance Approach for Internet of Things
In the Distributed Management Task Force, DMTF, the management software in the Internet of things (IoT) should have five abilities including Fault Tolerance, Configuration, Accounting, Performance, and Security. Given the importance of IoT management and Fault Tolerance Capacity, this paper has introduced a new architecture of Fault Tolerance. The proposed hybrid architecture has used all of the reactive and proactive policies simultaneously in its structure. Another objective of the current paper was to develop a measurement indicator to measure the fault tolerance capacity in different architectures. The CloudSim simulator has been used to evaluate and compare the proposed architecture. In addition to CloudSim, another simulator was implemented that was based on the Pegasus-Workflow Management System (WMS) in order to validate the architecture that is proposed in this article. Finally, fuzzy inference systems were designed in a third step to model and evaluate the fault tolerance in various architectures. Based on the results, the positive effect of using various combined Reactive and Proactive policies in increasing the fault tolerance in the proposed architecture has been prominently evident and confirmed